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Putting the Precision in Precision Cosmology: 

How accurate should your data covariance matrix be? 
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ABSTRACT 

Cosmological parameter estimation requires that the likelihood function of the data 
is accurately known. Assuming that cosmological large-scale structure power spectra 
data are multivariate Gaussian-distributed, we show the accuracy of parameter esti- 
mation is limited by the accuracy of the inverse data covariance matrix - the precision 
matrix. If the data covariance and precision matrices are estimated by sampling in- 
dependent realisations of the data, their statistical properties are described by the 
Wishart and Inverse- Wishart distributions, respectively. Independent of any details of 
the survey, we show that the fractional error on a parameter variance, or a Figure- 
of-Merit, is equal to the fractional variance of the precision matrix. In addition, for 
the only unbiased estimator of the precision matrix, we find that the fractional ac- 
curacy of the parameter error depends only on the difference between the number of 
independent realisations and the number of data points, and so can easily diverge. 
For a 5% error on a parameter error and Nd <C 10 2 data-points, a minimum of 200 
realisations of the survey are needed, with 10% accuracy in the data covariance. If 
the number of data-points Nd 3> 10 2 we need N$ > Nd realisations and a fractional 
accuracy of < ^2/Nd in the data covariance. As the number of power spectra data 
points grows to Nd > 10 4 - 10 6 this approach will be problematic. We discuss possi- 
ble ways to relax these conditions: improved theoretical modelling; shrinkage methods; 
data-compression; simulation and data resampling methods. 
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1 INTRODUCTION 

A central part of modern cosmology is the measurement 
of the parameters that characterise cosmological models of 
the Universe. These can be the set that constitutes the 
Standard Cosmological Model (fl m , fib, Ha, Ho, as,n s ,r), or 
an extended set that characterise, for example, more com- 
plex dark energy models (see e.g., Copeland, Sami & Tsu- 
jikawa, 2006, Amendola et al., 2012, for reviews), devia- 
tions from Einstein gravity (e.g., Clifton, Ferreira, Padillo, 
Skordis, 2012; Amendola et al., 2012 for recent reviews), 
more detail about the inflationary epoch (e.g., Amendola 
et al, 2012), isocurvature density and velocity modes (e.g., 
Bucher, Moodley & Turok, 2001), or massive neutrinos and 
their abundance (e.g., Bird, Viel & Haehnelt, 2012, and 
references therein). Furthermore, if we want to differenti- 
ate between theoretical models in a Bayesian framework, 
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as well as estimate their parameter value, we also need to 
accurately integrate over the model parameter-space (e.g., 
Trotta, 2007; Liddle, Mukherjee, Parkinson, 2006; Taylor & 
Kitching, 2010). 

To carry these tasks out we need both accurate theo- 
retical predictions of the physical properties of the model 
to compare to the data, and sufficiently accurate models 
of their statistical properties. Ideally, we would like to be 
able to accurately predict the full multivariate probability 
distribution of the data for each model. If, as is commonly 
assumed, the data can be modelled as a multivariate Gaus- 
sian distribution, all of the statistical properties of the model 
reside in the mean and covariance of the model. Attention 
has been focussed on the accuracy of the predictions of the 
mean value - e.g., the model power spectra - and the ef- 
fect of biases or errors in the mean (e.g., Huterer & Takada, 
2005; Huterer, et al, 2006; Taylor et al., 2007). But to fully 
specify the distribution of the data we also need accurate 
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predictions of the data covariance matrix and the inverse of 
the data covariance - the precision matrix. 

If we assume that the mean is well-known, the accuracy 
of the probability distribution of the data, and hence the 
likelihood function in parameter-space, is determined by the 
accuracy of the precision matrix. However, as yet there is no 
unique approach to estimating the data covariance matrix 
since this may depend on the details of what is known about 
the data, and even less attention paid to the estimation of 
the precision matrix. 

The data covariance matrix can be estimated in a num- 
ber of ways: direct calculation of a theoretical model; esti- 
mate the sample covariance from an ensemble of simulations 
of the data; or estimate the sample covariance matrix from 
the data itself. If we know the data covariance from the theo- 
retical model accurately enough, there is no statistical uncer- 
tainty, and the precision matrix can be accurately estimated. 
But if the data covariance matrix must be sampled from an 
ensemble of simulations, or the data itself, there will be sta- 
tistical uncertainty in the sample covariance. If we assume 
the underlying data is Gaussian-distributed and the sam- 
ples are independent and drawn from the same distribution, 
the probability distribution of the sample covariance matrix 
is known, and was first derived by Wishart (1928; see also 
e.g., Press, 1982). To fully specify the model distribution of 
the data we also require the precision matrix. The distribu- 
tion of the precision matrix, the Inverse- Wishart distribu- 
tion (e.g., Press, 1982), has significantly different properties 
from distribution of the covariance matrix. The Wishart dis- 
tributions has previously been discussed in cosmology as the 
distribution of Cosmic Microwave Background (CMB) tem- 
perature and polarisation power spectra (Percival & Brown, 
2006), while the Inverse- Wishart distribution has been used 
as a prior for Bayesian estimates of the CMB temperature 
and polarisation power spectra (Eriksen & Wehus, 2009), for 
Gibbs sampling (Larson et al., 2007), and to test Pseudo-Cl 
methods (Hamimeche & Lewis, 2009). 

In this paper we develop a new framework to estimate 
the statistical error on the data covariance and precision 
matrix (Section 3). We illustrate these effects on simulated 
data (Section 4) and discuss the implications for imminent 
and future large-scale structure surveys in cosmology. These 
effects are propagated into the accuracy of parameter er- 
rors, and the parameter covariance around the peak of the 
likelihood surface (Section 5). Since many experiments use 
the 2-parameter Figure-of-Merit (FoM) as a target measure 
for survey design, we also discuss the accuracy of an arbi- 
trary FoM (Section 6). Given a prescribed accuracy for the 
parameter covariance matrix, or a FoM, we show how accu- 
rate the precision matrix and data covariance matrix must 
be. Finally, we discuss ways in which we avoid these bounds 
by improved theoretical modelling of the data covariance, 
rapid simulation production, or using data compression and 
shrinkage methods (Section 7). We begin by reviewing pa- 
rameter estimation and the role of the precision matrix. 



2 PARAMETER ESTIMATION 

To begin with we shall assume that the cosmological pa- 
rameters, 0, being measured are estimated from maximis- 
ing a posterior parameter distribution, p(6\D,M), given a 



dataset, D, and some theoretical model, M (see, e.g., Sivia, 
1996). From Bayes Theorem, 



P(0\D,M) 



L(D\0,M)n(e\M) 
E(D\M) ' 



(1) 



we can determine the posterior parameter distribution from 
the likelihood function for the data, L(D\0, M), predicted 
by the model, a prior, n(9\M), which is the probability dis- 
tribution of the parameters before the data is analysed, and 
normalised by the evidence, E(D\M), which marginalises 
over the likelihood and prior in parameter-space. If we re- 
strict our study to parameter estimation for a given model, 
we can ignore this term. We shall assume the prior on the 
parameters is flat. 

If we model the data distribution as a multivariate 
Gaussian, then the likelihood function can be written 



L(D\fj,,M,M) = 
where 



^=exp-iTrW*, (2) 

(2n) N °/ 2 J]W\ 2 



W = ADAD*, 

a superscript, t, indicates a transpose, 
AD = D -(D) 



(3) 



(4) 



is the variation in the data-vector, p. = (D) is the mean of 
the data and No is the length of the data- vector. The data 
covariance matrix is given by 



M = (W) = (ADAD*). 



(5) 



We define \M\ — dctiW as the determinant. Comparing 
with a multivariate Gaussian we see that the matrix, \E', is 
the inverse of the data covariance matrix; 



* = M" 



(6) 



As we shall find this matrix is central to our analysis, we 
shall define the inverse data covariance as the precision ma- 
trix. The model dependence on cosmological model param- 
eters, 0, may lie in either the mean, /i = /i(0), or the data 
covariance matrix, M = M(9), or both. Throughout we 
shall assume that the cosmological parameter dependence 
lies only in the mean. In Appendix A we describe the data 
vectors commonly used in cosmological large-scale structure 
analysis: galaxy redshift surveys, cosmic microwave back- 
ground experiments and weak lensing surveys. Through- 
out, we shall assume that the data is a set of power spec- 
tra estimated from the data, although of course our results 
hold for correlation functions and are general to Gaussian- 
distributed data. 



3 COVARIANCE AND PRECISION 
3.1 Data covariance matrix 

If we have a physical model for the covariance matrix, we 
would choose to use this. However, the statistical proper- 
ties of the data may be poorly understood, for example the 
nonlinear regime for galaxy redshift surveys and weak lens- 
ing, and galaxy bias in redshift surveys, or the data may 
have been processed in ways which are not straightforward 
to model analytically, e.g., in CMB data where long- wave 
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variations in the time-ordered-data may have to be removed 
via polynomial fits, which can alter the statistical properties. 
In these cases we use an ensemble of simulations to estimate 
the sample data covariance matrix. In surveys where we do 
not know how to accurately simulate the data, we can use 
the data itself to estimate the data covariance. We return to 
this issue in Section 7. 

If we generate Ns independent realisations of the data, 
D a , where each realisation is labelled by a Greek index, 
a, ft, . . ., and adopt a convention of labelling the data- vector 
so that the Roman indices, i,j,..., indicates the wavenum- 
ber, i, or wavevector, k, and redshifts z,z',..., the data- 
vector averaged over the realisations is 



1 N S 

The expectation value of the data-vector is 
(D a ) = (D) = fi. 



(7) 



(8) 



For independent and identically distributed realisations, and 
where we can use a symmetry or binning of the data to 
average over iV mo des with the same mean value, the accuracy 
of the estimate of the mean data- vector will scale as 



a(D) = 



N s N n 



(9) 



An unbiased estimator for the data covariance matrix is the 
sample data covariance, M, from an ensemble of Ns inde- 
pendent and identically distributed realisations; 



N s 



M 

where 

AD a =D a -D a 



(10) 



(11) 



is the variation in the data for each realisation, and v is the 
number of degrees-of-freedom in the ensemble. If the esti- 
mated mean of the data-vector is know to be the expected 
mean, D a = (D) then 



N s . 



(12) 



However, if the mean is estimated from the data itself, we 
reduce the number degrees-of-freedom by one, so that 



v = Ns — 1. 



3.2 The Wishart distribution 



(13) 



The statistical properties of the sample data covariance 
matrix, assuming the variations in the measured field are 
Gaussian-distributed, are given by the Wishart distribution 
(Wishart, 1928), which generalises the x 2 -distribution; 



p(M\M,v,T)) 



i/"7/ 2 |M|-"/ 2 |M|t/ 2 



iTrMM- 



2 vr l/2r v [v/2] 
where \M\ = dct M is the determinant of M, 



r, ©=^- 1)/4 I[rg 



+ 



1 - s 



,(14) 



(15) 



is the multivariate Gamma function (see Appendix Bl for 
a definition), 77 — Nd is the size of the data- vector, M and 
M are 77 x 77 matrices, v is again the number of degrees of 
freedom of M, and 7 = v — 77 — 1. We require that v > 77, 
to ensure the estimated data covariance matrix is positive 
definite. 

For a single data point, where 77 = 1, the Wishart dis- 
tribution is the reduced-x 2 distribution, 



p{yW) = (0 



vy/2 



I>/2] 



(16) 



where y = M\\jM\\ = \ 2 As with mean (y) = 1 and vari- 
ance <J 2 {y) — 2/v. 

The mean of the general Wishart distribution is 

(M) = M, (17) 

showing it is indeed an unbiased estimate of the covariance 
matrix, while the covariance of M is 



(AMij AMmn) = -(M im M jn + M in M jm ). 



(18) 



This result can also be derived from the Gaussian four-point 
function or directly from the Wishart distribution (see Ap- 
pendix B2 where we calculate the characteristic function for 
the Wishart). 

3.3 The precision matrix and Inverse- Wishart 

The simplest estimator for the precision matrix is 



* = v 



N S 



(19) 



where v is the number of degrees-of-freedom. This estimator 
follows an inverse, or inverted, Wishart distribution (see e.g., 
Press, 1982), 



p(*|*,i/, y) 



,i"7/2|v|>|-/9/2|v|>|iVS 

2 i "'/ 2 r^[i//2] 



e -|TV* * (20) 



where P = v + ij + 1, and 77 = Nd is the size of the data- 
vector. We derive the Inverse- Wishart distribution in Ap- 
pendix B3. 

For a single data point, 77 = 1, the Inverse- Wishart re- 
duces to the inverse-x 2 distribution, 



p{x\v) = 



v/2 



y/2-1 



T[v/2] 



-v/2x 



(21) 



where x = Mn/Mn = v/x 2 . The mean of this distribution 
is 

v 



(x) 



u>2 



and its variance is given by 
a (x) 



v > 4. 



(22) 



(23) 



(v- 2) 2 (^-4)' 

Immediately we see that the inverse distribution has differ- 
ent properties to x 2 . Not only is the mean of the inverse- 
distribution biased high, (x) > 1, but both the mean and 
variance can diverge. 

We can understand the behaviour of the inverse-x 2 by 
considering the underlying Gaussian field. If AD a is the 
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one data point, sampled N$ times, the sample variance is 
Mn = ^■^a/ v - As the fields are Gaussian, they 

will fluctuate symmetrically around AD a = 0. Squaring and 
summing will produce positive values with mean (AD 2 } = 
Mn. However the sample variance will scatter around this, 
bounded from below by zero. When we invert the sample 
variance, some of the values which are close to zero will 
become arbitrarily large. As there are no compensating small 
values, these large values will bias the mean of the precision 
matrix high, skewing the distribution. 

The expectation value of the sampled precision matrix is 
biased and, assuming the mean in unknown, given by (Kauf- 
man, 1967; see Press, 1982; Anderson, 2003; see also Hartlap 
et al., 2007, for a first application to cosmology) 



(*) = 



N S -1 



Ns — Nd — 2 



(24) 



If Ns > Nd + 2 is not satisfied then the values of M are not 
positive-definite and its inverse is undefined. If we do satisfy 
this condition then the bias on the inverse can be corrected 
to yield an unbiased estimate of the precision matrix given 
by 



^unbiased — 



Ns — Nd — 2 
N s - 1 



(25) 



In fact, this estimator for the precision matrix is the only 
unbiased estimator. 

The covariance of the sample precision matrix, again 
assuming the mean is unknown, is (Kaufman, 1967; see also 
Press, 1982, Matsumoto, 2011) 



A 2* ij * m „ + [Ns -N D -2) {^im^jn + * l n* 3 



where 



A = 



(N S - l) 2 



(N s -N D - 1)(N s -N d - 2) 2 (AT 5 - N D - 4) ' 



,(26) 



(27) 



The second term in equation (26) has the same form as 
the covariance for the data covariance matrix of Gaussian- 
distributed variables, and dominates when Ns >• Nd + 2. 
This arises for large numbers of realisations as the Central- 
Limit Theorem will tend to make the Inverse- Wishart Gaus- 
sian distributed. The first term in equation (26) arises from 
the shift in the biased mean of the precision matrix. 

The covariance matrix of the sample precision matrix is 
also biased high. If uncorrected, it leads to an overestimate 
of the uncertainty in the precision matrix and parameter er- 
rors. We can correct for the biases in the mean and covari- 
ance of the sample precision matrix, assuming the number 
of degrees-of-freedom is known and Ns > Nd + 2. The un- 
biased covariance of the precision matrix can be found by 
substituting the prefactor, A, in equation (26) by a corrected 
factor; 



(28) 



(Ns-N D -l)(Ns-N D -4)' 



The unbiased variance of the elements of the estimated pre- 
cision matrix is 

°Lr [%] = A corr [(N s - N D ) *?j + (N s - N D - 2) *« * 33 ] , (29) 



where no summation over repeated indices is implied. A use- 
ful expression is the trace of this variance, which is given by 



Tra 2 orr [*] 



(Ns — Nd — 4) 



(30) 



In the limit that Ns 3> Nd + 4 the uncertainty in the preci- 
sion matrix fall s off l ike the uncertainty on the data covari- 
ance matrix, \J 2 /Ns ■ However, unlike the sample covariance 
matrix, the uncertainty on the sample precision matrix di- 
verges when the number of realisations is close to the number 
of data points, while the sample covariance matrix is singu- 
lar for fewer realisations. Hence, we find two closely related 
requirements. The number of independent realisations used 
to estimate the data covariance and precision matrix, Ns, 
must be larger than the total number of data-points, Nd, 
being measured, Ns > No + 2, to allow a correction for the 
bias in the estimated precision matrix, and Ns > Nd + 4 
to avoid a divergence in the error on the precision matrix. 
Hence the bias and covariance of the precision matrix only 
depend on the number-of-degrees of freedom, Ns — Nd, and 
the model precision matrix, and are independent of the de- 
tails of the experiment. This is a very simple, and powerful, 
result. 



4 AN EXAMPLE FROM COSMOLOGY 

4.1 Simulating a Weak Lensing Survey 

As an example of the bias and variance of the sample pre- 
cision matrix for a cosmological survey, we consider a sim- 
ulation of a weak lensing shear survey. In Appendix A3 we 
describe the weak lensing fields and power spectra. Here we 
consider a single shear field, with no B(/3)-modes. We gen- 
erated Ns = 100 samples of a 10 x 10 square degree weak 
lensing survey, with a Gaussian random shear field at a sin- 
gle redshift which we used to estimate the mean, covariance 
and precision matrix of the shear power. The surface density 
of galaxies is ni = 30 per square degree, with a median red- 
shift of z m = 0.9. We did this for sample sizes from Ns = 10 
to Ns — 100. We repeated this 100 times to generate inde- 
pendent groups of the Ns samples to estimate the mean and 
variance of the covariance and precision matrices. 

This numerical experiment is non-trivial, as it has 
some more realistic assumptions compared to our analysis. 
In the simulations the underlying shear field is Gaussian- 
distributed, rather than the shear power itself. In practise 
this is what we expect for the large angular scale shear field, 
while on small scales we expect the shear field to be nonlin- 
ear and non-Gaussian. This will test if, on large scales, the 
assumption of Gaussian-distributed power is justified. 

The Gaussianity of the shear field for a single redshift 
ensures that the shear power covariance matrix is diagonal, 



M 



Mi. 



data covariance matrix is 



M(Sf e ,. The covariance matrix of the sample 



AM t AM v 



■MUu>, 



(31) 



N s - 1 

while the covariance matrix of the unbiased precision matrix 
is 



Atf £ A3v > = 2A 



[(N s -N D - 2)*? 8& + .(32) 
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The Gaussian part is diagonal, as expected, however the 
shifted term introduces off-diagonal terms. The diagonal 
of the precision matrix now propagates into every term in 
the precision covariance matrix, with the ratio of diago- 
nal (Gaussian) to non-diagonal (shift-term) terms scaling 
as Ns — Nd — 1. The variance of the bias-corrected precision 
matrix is 



_x10' 



,N S -N D -4 



91 



(33) 



Taking the sum of this, or the trace of the covariance matrix, 
we recover equation (30). Gaussian-distributed weak lensing 
convergence power spectra, on different angular scales and 
between redshift-bins, has the covariance 

(\AC KK (£,z,z')\ 2 ) = 

- f l, u [\C KK {i, z, z')\ 2 + C KK (£, z, z)C KK {i, z', z')],(M) 

Jsky{M + 1) 

where / s k y is the fraction of the sky covered by the survey. 
If the shear power is binned into logarithmic passbands we 
divide by £A In £ = A£, the number of ^-modes in each pass- 
band. 



4-1.1 Whitening the covariance matrix 

In cosmological surveys the data values can span several or- 
ders of magnitude, and so the conditional number of the cor- 
responding covariance matrix is very large. This can cause 
numerical instabilities in the inversion to estimate the pre- 
cision matrix, as well as the failure of some non-standard 
estimators (see Section 7.2). If a good model of the covari- 
ance elements is known, one can whiten the covariance ma- 
trix (see e.g., Bond, et al., 1998) by rendering its diagonal 
elements close to unity. Denoting the model data covariance 
elements by M™ od and introducing a transformation matrix, 

Tij = (1/y M™° d )5ij , as the inverse of the square-root of 
the diagonal elements of the model covariance, we define a 
new, whitened, data covariance matrix by Mw = TMT . 
The whitened matrix, Mw, has a conditional number close 
to unity and is readily inverted, which we carry out using 
Singular Value Decomposition (SVD). The precision matrix 
is then obtained from the inverse of the whitened data co- 
variance via — 7 1 M W 1 1 1 . Here we whiten all data 
covariance matrices before correcting for whitening in the 
precision matrix. 



4-1.2 Numerical Results 

Figure 1 shows the mean and scatter (diagonal of the co- 
variance) in the estimated shear power spectrum from our 
Ns = 100 simulated weak lensing survey, compared to 
the ACDM input model power. We have chosen Nd = 36 
data points on the power spectrum as a compromise be- 
tween oversampling the power spectrum with correlated 
data points, and under-sampling and missing some of its 
features which will contain parameter information. 

Figure 2 shows the fractional bias in the trace of the 
mean of the sample precision matrix, which we define as 



B = 



£<** 



N s 




Figure 1. Simulated weak lensing shear auto-power spectrum, 
C KK (£, z,z) from 100 simulated 100 square degree surveys, for 
a single median redshift of z = 0.9. The solid line is the input 
power spectrum, while the data points are the mean estimated 
power spectrum, and the error bars are estimated from the 100 
samples. We repeated this set of simulations 100 times to estimate 
the covariance of the data covariance and precision matrices. 




30 40 50 60 70 
Number of Simulations/Group, N 



100 



Ns-Nd-2' 



(35) 



Figure 2. The fractional bias in the precision matrix of shear 
power spectra from Ns simulated and independent realisations 
of a 10 2 square degree weak lensing survey, with Njj = 36 power 
spectra data-points. The statistical properties of the precision 
matrix are generated from groups of 100 simulated surveys. The 
solid red line is the predicted scaling, while the vertical black line 
is the expected divergence for Ns = Nd + 2. Blue crosses are the 
estimated bias from the simulations, using equation (19), which 
closely follow the prediction. We have suppressed error bars on 
points, which are correlated. 



where VE* is estimated from equation (19), for Nd = 36 shear 
power spectra passbands as a function of number of realisa- 
tions, Ns, in each group. The Ns realisations are cumula- 
tive, so each point is correlated with points on the left. 

The numerical model, including the predicted diver- 
gence at Ns = Nd + 4 (solid vertical line), agrees ex- 
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tremely well with the prediction, as has previously been 
shown by Hartlap et al. (2007) in a similar cosmological 
context. There, they showed the bias followed the expected 
behaviour using a simulated weak lensing survey based on 
ray-tracing through many lines-of-sight of the Millennium 
N-body simulation, when the effects of non-linear clustering 
are included. The agreement between the simulations and 
prediction implies that the estimated precision matrix can 
indeed be debiased with the correction given by equation 
(25). 

Figure 3 shows the measured fractional scaling of the 
trace of the covariance of the sample precision (crosses), de- 
fined as 



Eyj — 



N S -N L 



(36) 



and the trace of the fractional error on the data covariance 
matrix (stars), defined as 



Em = 



EMM. 



Ns-V 



(37) 



as a function of number of realisations, Ns- We have plot- 
ted the Wishart prediction (solid and dotted lines) for both 
statistics. The data covariance can be estimated for Ng > 1 
data points, and its variance is stable below the Ng = Nd 
line. However, the variance of the precision matrix estimate 
diverges when we reach the number of data points, as pre- 
dicted by the Inverse- Wishart distribution. Again the sam- 
ple of Ns realisations is cumulative and so each point is cor- 
related to the points on its left. Again there is a very good 
agreement between the predicted and measured scaling of 
the variance of the data covariance matrix. The agreement 
between the predicted and numerical scaling of the variance 
of the precision matrix is also good, but there is some scat- 
ter and slight deviation which we attribute to the accuracy 
of the inversion of the sample data covariance. 



4.2 The size of future surveys 

As we have seen the main driver for the number of sim- 
ulations comes from the precision matrix, whose accuracy 
is driven by the number of data points in our sample, 
Ns > Nd + 4. Here we discuss typical values which will 
be encountered. The issue of data compression will be dis- 
cussed in Section 7.3. 



4-2.1 Pixelised or discrete data-sets 

In the case of pixelised data (Cosmic Microwave Background 
or Weak Lensing), or data where the individual data is sam- 
pled (Galaxy Redshift Surveys or Weak Lensing again) the 
number of data points can rise quite rapidly. If there is no 
analytic model for the pixel or data covariance matrix, the 
cost of simulations can be prohibitively high for Ns > Nd ■ 
We are then forced into some form of data compression, such 
as, in Cosmology, the estimation of two-point, or a number 
of n-point, power spectra or correlations. 




30 40 50 60 70 
Number of Simulations/Group, N 



100 



Figure 3. The error in the estimated precision and data covari- 
ance matrix from Ng realisations of the Weak Lensing power 
spectrum, with Nd = 36 data-points, generated from groups of 
100 10 2 square degree simulated surveys, as a function of Ng. The 
vertical black line is the number of data points. The blue stars 
are the statistical errors on the unbiased data covariance matrix, 
compared to the predicted scaling (dotted line). Blue crosses are 
the statistical errors on the unbiased estimator of the precision 
matrix, equation (25), compared to the predicted scaling. Again 
the agreement is good, with differences due to numerical effects. 
We have suppressed error bars on points, which are again corre- 
lated. 



4-2.2 Power spectrum analysis 

In the case of a galaxy redshift survey, we imagine typically 
Nk = 50 data points sampling the galaxy spectra. To sample 
the anisotropic distortion in redshift-space we would need 
JVfc data points. If this was repeated in Nb = 10 redshift 
bins, we would have a total of Nd = N^Nf, = 2.5 x 10 4 data 
points in total. 

For a tomographic, weak lensing power spectrum anal- 
ysis measuring N spcc power spectra, over iVf,-redshift bins, 
the total number of auto- and cross-spectra for spin- 2 fields 
(including B-modes) is Nb(2Nb + 1). If each power spec- 
trum has Ne passbands, the total number of data-points is 
Nd ~ NeNb(2Nb + 1). If we again assume Nb = 10, we can 
measure 210 different power spectra. With N( = 50 pass- 
bands per spectra per redshift, we have Nd = 1-05 x 10 4 
passbands, and the number of independent realisations we 
require is 



(38) 



If we add the lensing magnification power, C M ' 1 '(£, z, z'), to 
this, the estimated number of independent realisations in- 
creases by a significant factor. 

If we combined cosmological probes we can estimate 
the size of a combined data-vector. If we assume Nb = 10 
redshift bins, we have a total of Nf — 43 fields and 



N BP ec = |JV)(JV> + 1), (39) 
auto- and cross-spectra. For our example we then have 
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946 spectra. Assuming further that each spectrum 



has N( = 50 passbands, we have 

N D = N e N Bpcc , 



(40) 



or over 5 x 10 data-points in our data-vector. These raw 
numbers clearly represent a significant challenge for gener- 
ating realisations. In Section 7 we discuss ways in which to 
avoid the Wishart bound and the need to generate such large 
numbers of simulated surveys. 



5 PARAMETER COVARIANCE MATRIX 

5.1 Covariance of the Fisher Matrix 

Having found the statistical properties of the sample preci- 
sion matrix we now turn to our main goal, to understand 
how the accuracy of the precision matrix propagates into 
maximum likelihood parameter estimation. To do this, we 
use the Fisher matrix formalism (e.g., Tegmark, Taylor & 
Heavens, 1997) to see how inaccuracies in the precision ma- 
trix leads to inaccuracy in the Fisher matrix and leads to 
inaccuracy in the parameter covariance matrix. 

The log-likelihood, C = — 21ogL, can be expanded to 
second-order around its peak in parameter-space where the 
expectation value of the gradient of the log-likelihood is 
(d a C) = 0, while the expectation value of the curvature 
yields the Fisher Matrix, 



{d a dpC) = 2F a p. 



(41) 



The derivatives of the mean are taken with respect to the 
parameters. For Gaussian-distributed data, this is given by 
(Tegmark, Taylor & Heavens, 1997) 



(42) 



With the Gaussian approximation, the likelihood surface of 
the parameter-space is specified completely by the parame- 
ter covariance matrix, given by the inverse of the Fisher 
matrix, 



<E> a/3 = (A9 a A9p) = 



(43) 



If the data is again assumed Gaussian-distributed, the un- 
certainty on the precision matrix propagates into the Fisher 
matrix by 

AT a p = ^(dandpfj, 1 + a Q /j.'9 /3 /j.)A*, (44) 

where A\E> is a random variation in the precision matrix. The 
covariance between terms in the Fisher matrices is given by 



(AF^pAF^) = -(datudpn* + 3a/**0/3/*)(A¥A¥) 
x (d fl n t d„n + d^ixdvH*). 



(45) 



We shall assume that the uncertainty in the mean of the 
data, (i, is negligible. Substituting equation (26) in for the 
covariance of * we find the unbiased covariance of the Fisher 
matrix is 

(AJ^AJ^) = 

Aorr [(N s -N D -2) {TccpTpv + T av T^) + 2F a pF^\ ,(46) 
valid for Gaussian-distributed data. 



5.2 Covariance of the parameter covariance 

The parameter covariance matrix is the inverse of the Fisher 
matrix and so the uncertainty in the parameter covariance 
matrix is, to first-order, 



A$ c 



P 



J- AJ~ -yfiJ- $p , 



(47) 



where we assume summation over repeated indices. The co- 
variance of the parameter covariance matrix is 



(AQ^A®^) = ^ a 5^ n p{AFs v AF^)^^ ev . 



(48) 



Substituting equation (46) for the covariance of the Fisher 
matrix, we find the covariance of the parameter covariance 
matrix is 

[(N s -N D -2) {^oc^.p + ® a v®p») 

+ 2<E> a/3 $„ M ]. (49) 

This is a central result of this paper. From this we see 
that the Inverse- Wishart covariance propagates through to 
the covariance of the parameter covariance matrix, with 
the Gaussian and shift terms. Again this diverges if Ns < 
Nd + 4. The components of this matrix can be written as 

2 



|2\ 



(50) 



Ns — No — 4 
(lA^j 2 ) = A COTT [(N s - N D )r 2 a p 

+ (Ns -N D - 2)] $ aa $ w , (51) 
(A^^A^) = 2A COII [l + (N s - N D - 2)r 2 af) ] ® aa <f>pp, 

(52) 

(Afc^ASflB) = — 2 — 7 <f> a p$pp, (53) 



Ns — Nd — 4 

where we have defined the parameter correlation coefficient 

$a/3 



(54) 



From this result we see that the main factors which affect 
the covariance of the parameter covariances are the differ- 
ence between the number of realisations of the survey and 
the size of the dataset, Ns — Nd, the degrees-of- freedom of 
the precision matrix, and the parameter correlation coeffi- 
cient, r a @. This now provides us with a way to determine 
the accuracy with which we can estimate the distribution of 
parameter values in parameter space. 



5.3 Accuracy of parameter errors 

We can use this result to demonstrate how the accuracy of 
the errors on a parameter can be translated into the number 
of degrees-of-freedom in the data covariance, or equivalently 
the number independent realisations needed to estimate the 
precision matrix, and on the accuracy on the precision and 
data covariance matrices. For a single cosmological param- 
eter (marginalised over all other parameters), the error on 
the parameter variance is given by equation (50). Comparing 
this to equation (33), and assuming that the data covariance 
is diagonal, we see that the fractional accuracy of the pa- 
rameter variance is equal to the fractional accuracy of the 
precision matrix. 

Defining s as the fractional accuracy of the parameter 
variance (equation 50), 



© 0000 RAS, MNRAS 000, 000-000 



8 Andy Taylor, Benjamin Joachimi, Thomas Kitching 



8 c p /2 



Figure 4. Sketch of the error on a parameter, p, given by <r p , 
and the error on the error bar, ea p /2, where e is the fractional 
variance on the precision matrix. 



Ns-N D 



(55) 



we can consider an arbitrary parameter, p, with expected 
value po- A measurement of p will yield an uncertainty, a p , 
while the error on that uncertainty will be ea p /2. We can 
write this as 



: Po ± 



l±\e 



(56) 



Figure 4 shows a sketch of this, illustrating for a single pa- 
rameter the error bar and error on the parameter error. 

The fractional error on the diagonal elements of the 
precision matrix is then given by (from equation 29, see also 
equation 33), 

<x[¥«] 



If, 



(57) 



Independent of the details of the survey, we find the required 
number of independent realisations of the survey for a given 
parameter accuracy and number of data points is 

N s > ^ + (N D + 4). (58) 

The first term here is the usual root-iVs scaling for indepen- 
dent samples, and sets a lower limit on the number of inde- 
pendent realisation required to reach a given accuracy. The 
second term arises from the Inverse- Wishart variance, where 
the number of independent realisations needed to reach a 
given accuracy scales as the number of data points. Figure 
5 shows the scaling of the number of independent samples, 
Ns, with the size of the data-set, Nd- The value of Ns in 
the limit Nd — > 0, is set by the desired accuracy of the 
parameter variance. 

The fractional error on the data covariance matrix, for a 
given parameter error accuracy and number of data points, 
is 



< 



2e 2 



(59) 



\Mu\ \2 + e 2 {N D +4) 
This has two regimes. When e 2 < 2/(N D + 4), i.e. for small 



10" 



10 



10' 



- £=0.01 

£=0.05 

-£=0.10 

£=0.50 
£=1 .00 



Figure 5. Scaling of the number of independent realisations of 
the survey, Ns, as a function of the size of the data set, Nd, for 
different fractional accuracies on the variance of the parameter 
variance, e. 



data-sets compared to the required accuracy, the fractional 
error on the data covariance scales as 



o[Mu 
\Mu\ 



(60) 



the same as for the precision matrix and the variance on the 
parameter variance, while for e 2 3> 2/ {No + 4), when the 
data-set is large, the error on the data covariance scales as 



\Mu\ 



< 



No +4 



< e. 



(61) 



For large-data sets, this scaling puts the strongest con- 
straints on the accuracy of the data covariance matrix, which 
can be much higher than the accuracies of the precision and 
parameter covariance matrices. 



5.4 Constraining the parameter error 

While the accuracy of the parameter error is set by the num- 
ber of independent realisations of the survey, Ns, and the 
data size, No, it is useful to consider what typical accura- 
cies any analysis should achieve, independent of the details 
of the particular survey. A reasonable accuracy for a pa- 
rameter error is 5%, since a much higher accuracy will put 
strong requirements on the number of realisations, while 
lower accuracy will compromise the measurement error. This 
requires that the marginalised parameter variance should be 
accurate to 10% and that the precision matrix, is accu- 
rate to 10%, or e — 0.1. From equation (58), this requires 
Ns > 200 + Nd + 4 independent realisations of the survey 
to reach this accuracy. 

For a small data set, with Nd <C 100, a minimum of 204 
independent realisations of the survey yields a 5% error on 
the parameter error. This implies that the data covariance 
matrix is accurate to 5%. When the data-set becomes Nd 
100, we require Ns > Nd + 4 independent realisations, and 
the fractional accuracy of the data covariance matrix scales 
as 
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o\Ma 
\Mu\ 



« 5%. 



(62) 



In particular, for forthcoming tomographic cosmological sur- 
veys with 10 redshift bins, we can expect some 10 4 power 
spectra data points requiring at least this number of inde- 
pendent realisations. This will increase the accuracy on the 
data covariance matrix to 1.4%. For combined data-sets the 
number of data points can rise to Nn ~ 10 6 , and hence re- 
quire an accuracy of 0.14% on the data covariance matrix. 
Achieving these accuracies will be challenging. 



6 FIGURES-OF-MERIT 

6.1 Uncertainty of the Figure-of-Merit 

In addition to marginalised parameter errors, it is useful to 
know how the uncertainty in the precision matrix affects the 
FoM. The FoM is the inverse of the enclosed area within a 
certain likelihood contour, and is frequently used as a target 
statistic to optimise cosmological surveys. Our aim here is to 
understand how inaccuracies in the precision matrix prop- 
agate through the parameter estimation into a FoM, and 
how fixing the required FoM can be used to put constraints 
on the accuracy of the precision matrix and data covariance 
matrix. 

The dark energy Figure-of-Merit (DE FoM), E WQWa , is 
defined as the inverse of the area of the 68% error-ellipse for 
a two-parameter dark energy model (e.g., Albrecht et al., 
2006), 



0.95 




0.95 



0.95 



0.25 0.3 0.35 

n 




FoM/1 s 



Figure 6. Numerical simulation of the parameter likelihood con- 
tours, estimated from a simulated weak lensing survey. The left- 
hand side (LHS) shows the 2-parameter 86.3% likelihood contour 
for the cosmological parameters, f2 m , the density parameter of 
matter and the clustering spectral index n a . The right-hand-side 
(LHS) shows the frequency distribution of the FoM (inverse area) 
for these parameters over 100 realisations. The top row is for a 
sample-size of Ns = 50 realisations, the middle row for N$ = 80, 
and the bottom row for Ns = 100. As the number of realisations 
increases the accuracy increases as predicted. 



$2 

^WQ w a 



(63) 



where wo and w a parameterise the dark energy equation of 
state, w(a) = pde(a) / Pde(a) = wo + w a (l ~ a) (Chevallier & 
Polarski, 2001; Linder, 2003), where a(t) is the cosmologi- 
cal sale factor, and pd c and Pd a are the energy-density and 
pressure of the dark energy. We define a general FoM matrix 
for any two parameters as 

where no summation over the repeated indices a and j3 is 
implied. In the second expression we have used the param- 
eter correlation coefficient, r a p. 

It is useful to consider the inverse of the elements of the 
FoM, the area of each ellipse in the parameter space 

A a p = — — . (65) 

The fractional change in H aj 3 due to a change in the area is 
A5 a/ 3 _ AA a p 



A 



a/3 



(66) 



Varying the parameter covariance matrix in the FoM, we 
find the fractional change in the area of the error ellipse is 



AA 



a/3 



A 



aP 



2(1 -rL) I $ 



PP 



-2r a p 



aP 



.(67) 



Using the results of the covariance of the parameter covari- 
ance matrix, equations (50) to (53) for the unbiased preci- 
sion matrix, we find the variance of each FoM is 



V 2 \5.ap] 



(N s - N D ) 



{N s - N D - 4)(N S - N D - 1)' 



-a/3 | 



(68) 



Once again, the fractional variance of the FoM depends only 
on the difference between the number of independent real- 
isations of the survey used to estimate M and "J/, and the 
size of the data set. 

In Figure 6 (LHS) we show the 2-parameter, 
marginalised likelihood surface in the Q m — n 3 plane for 
our weak lensing simulations. Each ellipse is a 2-parameter, 
68.3% likelihood contour for the group of simulations with 
Ns realisations. As the number of realisations increases from 
Ns = 50 to 100, we see the spread in areas decreases. To 
quantify this, in Figure 6 (RHS) we plot the frequency dis- 
tribution of the FoM for this parameter plane. 

Figure 7 shows the predicted uncertainty in the FoM, 
equation (68), compared to the variance of the FoM distri- 
butions shown in the RHS column in Figure 6, as a function 
of number of realisations, Ns- We see good agreement be- 
tween our prediction and the error measured on the FoM 
from the weak lensing simulation. 



6.2 Accuracy of the Figure-of-Merit 

In the design of many cosmological surveys, the FoM is used 
as a target statistic to optimise the survey design, varying 
area, depth and number of photometric passbands to find 
the design which maximises the FoM. Having set this op- 
timal FoM, we then want to keep biases and uncertainties 
down to a level which does not violate the expected FoM. 
Here we develop an approach which uses the required FoM as 



© 0000 RAS, MNRAS 000, 000-000 



10 Andy Taylor, Benjamin Joachimi, Thomas Kitching 



[.) 10' 



10 



10 ' 



10 



_(N;-N D )/(N s -N D -4)/(N s -N D -1): 




20 



30 40 50 60 70 
Number of Simulations/Group, N 



80 



90 



100 



Figure 7. The scaling of the fractional error on the FoM, S a n, 
as a function of the number of simulated realisations of a weak 
lensing survey, Ns, with No = 36 data points. The solid blue 
line is the scaling predicted from the Inverse Wishart distribution, 
while the red line is the scaling found from the simulated surveys. 



a constraint to determine the number of survey realisations 
needed to do this. We then translate this into the accuracies 
of the precision and data covariance matrices. 

In order to keep within a required FoM we set the con- 
straint that the uncertainty in the likelihood area, when 
added in quadrature with the area, should not exceed some 



fiducial value, A 





a/3! 



Ale+a 2 [A af3 ]<(A%) 2 



(69) 



Assuming the uncertainty in the area is small, and taking 
the expectation value, we can re-write this in terms of the 
FoM, 



(70) 



The effect of a random change in the area of the error el- 
lipse in parameter-space will, on average, reduce the FoM. 
Hence the actual FoM we need to measure, E Q( 3 , to meet the 
required S° 3 is increased. We define the fractional error in 
the FoM as 




(71) 



In order to keep the fractional increase in the FoM below 
some value, e|, we can solve equation (68) for Ns and find 
that the number of independent realisations should be 



Ns>Nd + _ + _ 

In the limit that £= C 1 we find 
N s > N D + -=-. 



1 + V(l + 94)(l+ei 



(72) 



(73) 



Again, we see that for small data-sets the number of reali- 
sations is fixed, this time at Ns = 125, while for large-data- 
sets the number of realisations again scales as the number 
of data points. 



6.3 Accuracy of the precision matrix 

To set a constraint on the accuracy of the precision matrix, 
for a given accuracy on the FoM, we again only consider the 
diagonal components of the precision matrix, where a[^/u] — 
e\9u\ (see equation 57). Substituting the constraint from the 
FoM on the number of realisations, equation (72), we find 



g 2 [fr»] 



(75) 



1 + v /(l + 9e|)(l + e|)-3el 

which only depends on the ac curacy of the FoM. In the 
high-accuracy regime, e= < 1, this reduces to 

ct[*«] 



(76) 



If we require for the FoM that e= = 0.1 this implies that 
ff[*«]«0.19|*«|, (77) 
or an accuracy of 19% on the precision matrix. 

6.4 Accuracy of the data covariance matrix 

Given we know that 



a[Mu\ 
\Mu\ 



N s ' 



(78) 



and that the number of independent realisations required 
to reach a given accuracy of the FoM scales according to 
equation (72), we can write 

a[Mu] 



2e~ 



\M,A 



'l + {5 + 2N D )el + v /(l + 9 £ l)(l + e|) 

In the high-accuracy regime, e| <C 1, this reduces to 

(t[M u ] _ / 2g| 
IMal ~ V 1 + NdA' 



(79) 



(80) 



which for Nd£b <§C 1 reduces further to ftt y/2e~,, scaling like 
the fractional accuracy of the precision matrix in the same 
regime, while for Nns% 3> 1 the fractional error reduces 
to \J 2/Nd < Vies- Hence, for high-accuracy FoM's and 
large data-sets, the accuracy of the data covariance matrix 
is driven by the size of the data-set. 

For our fiducial accuracy of e — 0.1 we find 



cr\Mu\ 
\Mu\ 



0.02 



1 + (AW 100)' 



(81) 



where for Nd <C 100, the error on the data covariance is 
a[Mu] w 0.19|Mii|, the same accuracy as the precision ma- 
trix, while for No 2> 100 we find the y2/Nr> scaling. 



If we want the fractional error on the FoM to be 10%, the 
number of realisation required (using equation 72) is 



N s > N D + 125. 



(74) 



7 BEYOND THE WISHART BOUND 

The main conclusion of our analysis is that without an accu- 
rate model data covariance matrix, and if we sample the data 
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Figure 8. Different routes to determining inverse covariance matrices for use in likelihood analysis. The bottom level shows different 
ways to estimate sample and model covariance matrices, leading to the precision matrix and the likelihood function. 



covariance, the number of independent realisations needs to 
be greater than the number of data-points we are analysing. 
For large data-sets, such as for the surveys now underway 
in Cosmology, this requires a prohibitively large number of 
simulations of the data. In this Section we discuss alternative 
routes to obtaining an accurate precision matrix which may 
help to meet, or avoid, the tight requirements set by simple 
estimation. Alternative approaches will also provide valuable 
consistency checks on some of our assumptions. We consider 
four methods: theoretical modelling of the data covariance, 
optimal estimators, data compression, and simulation and 
data resampling. Figure 8 shows a schematic view of these 
methods and their relationship. We shall not consider more 
radical alternatives, such as going directly to estimates of 
the likelihood function itself. 

7.1 Theoretical modelling of the data covariance 

The problems of noise, inherent to the sample covariance 
matrix, are avoided altogether if we can accurately model 
the data covariance matrix analytically. Modelling of the 
data covariance ranges from assuming the data is Gaussian 
distributed on large-scales (e.g. Kaiser, 1992; Knox, 1995) to 
assessing the impact of non-linear clustering, using pertur- 
bation theory and simulations, on the galaxy power spec- 
trum covariances (Meiksin & White, 1999) and the weak 
lensing power spectra covariance (Scoccimarro, Zaldarriaga, 
Hui, 1999). In addition, the halo model has been used to esti- 
mate the covariance matrix for the matter and weak lensing 
power spectra (e.g., Cooray & Hu, 2001; Takada & Bridle, 
2007; Takada & Jain 2009; Kayo et al., 2012), while estima- 
tion of the data covariance matrix for a lognormal field (e.g., 
Hilbert, et al., 2011) seems to reproduce the main features of 
the covariance structure for weak lensing correlations. Given 



the level of difficulty in modelling the nonlinear regime one 
needs to test the range of validity of these model against sim- 
ulations. For example, Takahashi et al. (2009) have tested 
nonlinear modelling of the galaxy clustering data covari- 
ance matrix on a suite of 5000 simulations, while Sato et al. 
(2009) have tested nonlinear estimates of the weak lensing 
covariance on simulations. Kiessling et al. (2011) have also 
studied modelling non-Gaussian covariances and parame- 
ter forecasting using weak lensing simulations. Hamilton & 
Rimes (2005, 2006) first pointed out the loss of information 
in the quasi-nonlinear regime in the matter power spectrum 
due to non-Gaussianity using simulations, while Hamilton, 
Rimes & Scoccimarro (2005) have discussed some of the is- 
sues with estimating the data covariance matrix from sim- 
ulations. Theoretical models of the data covariance matrix 
have been applied to parameter estimation from data with 
the CMB (e.g. Verde, et al., 2003; Spergel, et al., 2003), 
galaxy redshift surveys (e.g. Ballinger, et al., 1995; Tadros, 
et al., 1999), and weak lensing (e.g., Brown, et al., 2003; 
Kitching, et al., 2007). 

Even if modelling is not precise, one could develop ana- 
lytic parameterised fitting functions to the data covariance, 
fitting the free parameters to simulations or the data. Im- 
portantly, these physically motivated models could allow one 
to incorporate the cosmology dependence of the covariance, 
which would require a substantial increase in the number of 
simulated realisations to cover parameter space. Sometimes, 
truncation or smoothing of the sample data covariance is 
used to suppress noise (e.g. Mandelbaum et al., 2012). How- 
ever, such approaches alter the number of degree-of-freedom 
in the data covariance and so we would no longer know how 
to correct the precision matrix for bias. 

If we assume for the moment that we can model the 
theoretical uncertainty on the data covariance matrix as 
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random, even for the next generation of surveys when we 
expect No ~ 10 4 , the data covariance has to be known to a 
few precent accuracy, which will become a problem for the- 
oretical computation. For surveys with 10 6 data points the 
accuracy of the data covariance has dropped to a fraction 
of a percent, putting high demands on its calculation (see 
Section 5.4). 



7.2 Optimal precision estimators 

If we do not have a reliable model or fitting function to the 
data covariance, we can still suppress the noise in the sam- 
ple covariance by combining it with some simple model or 
prior knowledge. This is generally referred to as shrinkage 
estimation. One can define optimised covariance estimators 
in the sense that they yield smaller variance than equation 
(29) while keeping the bias small. We investigate the perfor- 
mance of three well-known cases. 



from the data as well. The estimate for the precision matrix 
is given by the inverse of 



M shrink = A T + (1 - A) M , 



(86) 



where T is the theoretical target covariance matrix. This for- 
malism has been applied to covariance estimation of galaxy 
clustering power spectra by Pope & Szapudi (2008), and to 
the CMB by Hamimeche & Lewis (2009). Ledoit & Wolf 
(2003) derived an analytic estimator for the shrinkage in- 
tensity A, thereby greatly reducing the computational cost 
of this form of shrinkage estimation. It is given by (see also 
Schafer & Strimmer, 2005) 



A 



Var[My] - Cov[T, 



ET Var[My - Tij] + (My - T i: 



(87) 



where the variances and covariances are computed from the 
Ns realisations, so then (see also Pope & Szapudi, 2008) 



7.2.1 Shrinkage: Stein precision estimator 

Stein et al. (1972) have proposed the precision estimator 

(82) 



* Stein = Ns-N D -2 $ + N D (N D + l)-2 I 



N s - 1 



(N s - 1) Tr M 



which is defined for N s > N D + 2. If N s > N D , the 
estimator reduces to the unbiased estimate, * unbiased- If 
Ns ~ Nd 3> 1, the estimator returns 5 1-1 1, where 



S = —5— Tr M 

N D 



(83) 



is the average of the diagonals of M. This is exact if the co- 
variance is diagonal and homoscedastic. This estimator has 
smaller loss than any estimator that is proportional to ^ 
for a 'natural' loss function (Stein, et al., 1972), a general- 
isation of least squares between the matrix elements of the 
estimator and the true precision matrix. 



7.2.2 Shrinkage: Haff precision estimator 

A second estimator, suggested by Haff (1974), is; 



Haff 



Ns — Nd — 2 



N 8 - 
with 

U = S _:l |M| 1/JVd 



^ ((i-Vu^+Vus- 1 1 



(84) 



(85) 



again defined for Ns > No +2 only. The variable U measures 
disparity among the eigenvalues of M and lies in the interval 
< U < 1, shifting from the unbiased sample estimator 
(U — 0) to the estimator oc S^ 1 1 (U = 1). This estimator 
has smaller loss than any estimator that is proportional to 
\E> for a whole class of loss functions (Haff, 1974). However, 
this property is only guaranteed close to the divergent case, 
in our case for Ns < Nd + 4. 



7.2.3 Target data covariance shrinkage 

Shrinkage in its narrower sense refers to covariance estimates 
in which the balance between the sample covariance and the 
assumed model (the 'target', see Section 7.1) is estimated 



where 

= AD a , AD 



(89) 



As only one set of realisations is available in practice, the 
means in the denominator of Equation (87) have to be re- 
placed with estimates of M and T. If the target matrix is 
noise-free, Cov [Ty, My] = and Var[My -Ty] = Var[My]. 

If Mi] — Tij = 0, the target accurately describes the co- 
variance in the data and A = 1. Conversely, the shrinkage 
intensity tends to zero if the target attains a similar noise 
level as, and/or if the mean target deviates strongly from 
the mean of, the sample data covariance. 

We consider two choices for our target matrix to test 
on our weak lensing simulations. The first is a theoreti- 
cal estimate of the covariance matrix based on a Gaussian- 
distributed power spectrum (e.g., Kaiser, 1992), 



/ sky ^Aln€ 



I ■ 



(90) 



where / s k y is the fraction of the sky covered by the sur- 
vey, and we have assumed log-binning. Since this should 
correspond closely to the simulations we have generated we 
expect A — > 1. The second is an empirical target matrix, 

T 2 = SI , (91) 

which represents a minimum-knowledge approach. 



7.2.4 Testing Shrinkage 

Figures 9 and 10 show the fractional bias and error on the 
precision matrix for the Stein estimator (black points), the 
Haff estimator (green points) , and the two target shrinkage 
methods, model (blue points) and mean (purple points), ap- 
plied to our weak lensing simulations . 

• Stein estimator: The Stein estimator is more biased 
than the simplest sample estimator for large numbers of re- 
alisation, even though it asymptotes to become the same 
estimator. Simulations would be required to calibrate this 
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Figure 9. The bias in the estimated precision matrix from TVs 
realisations of the Weak Lensing power spectrum, with TVjj = 
36 data-points, generated from groups of 100 10 2 square degree 
simulated surveys, as a function of Ng. The blue circles are for 
direct inversion of the unbiased data covariance matrix, while 
green are for the Stein and Haff estimators. 
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Figure 10. The error in the estimated precision matrix from Ng 
realisations of the Weak Lensing power spectrum, with TVjj = 
36 data-points, generated from groups of 100 10 2 square degree 
simulated surveys, as a function of Ng. The blue circles are for 
direct inversion of the unbiased data covariance matrix, while 
green are for the Stein and Haff estimators. 

for any particular experiment. The variance of the Stein es- 
timator is low, yielding an almost constant 5% error. 

• Haff estimator: The Haff estimator is less biased than 
the Stein, and the sample estimator, but still shows signifi- 
cant bias. The variance of the Haff estimated precision ma- 
trix is again around 5%. 

• Target estimator: The Ti model target shrinkage es- 
timator is essentially unbiased, as we expect for an accurate 
model, and works for TVs < TVd + 4. The Ti model also does 
best at lowering the statistical to around 1% accuracy. The 
T2, empirical estimator yields a similarly unbiased precision 



matrix, which is of interest. The error on the estimated pre- 
cision matrix is slightly higher than the model target, at 5%, 
similar to that for the Stein and Haff estimators. 

We conclude that the model estimator, Ti, works im- 
pressively well when the model is a good approximation, 
minimising the bias in the precision matrix and reducing 
the error in the precision matrix to a few percent. For more 
realistic simulations, we would expect to take advantage of 
the more detailed theoretical models (see Section 7.1). In- 
terestingly, the empirical T2 target is similarly unbiased, 
with a 5% error, although we caution that our simulation 
covariance is also diagonal. Both of these estimators would 
satisfy our goal of an unbiased parameter errors with 5% 
error, requiring only TVs ~ 30 realisations compared to the 
TVs > 240 needed for the sample data covariance. We imag- 
ine this would work just as well for much larger data-sets. 
The Haff and Stein estimator have a bias similar to the sam- 
ple estimator, but without the known correction factor. One 
would have to calibrate these with simulations. If this can 
be done, the error is sufficiently low to be useable. 

7.3 Data compression 

Independently of how the realisations to compute the sample 
covariance matrix are obtained, one can lower the Wishart 
bound by reducing the size of the data vector. Karhunen- 
Loeve eigenvalue methods (Tegmark, Taylor & Heavens, 
1997, and references therein) are widely used in astronomy 
to compress data by finding a smaller data vector that max- 
imises the Fisher information (preserving parameter infor- 
mation), while simultaneously diagonalising the new data- 
vector covariance. Heavens, et al. (2000) introduced a linear 
implementation that is lossless for an arbitrary number of es- 
timated parameters if the data covariance is not parameter- 
dependent, and otherwise still performs better than princi- 
pal component analysis of the Fisher matrix. Data compres- 
sion can also help stabilise the inversion of the data covari- 
ance, since the noisy modes which cause numerical instabil- 
ities are removed (e.g., Taylor, et al., 2001). 

We can consider what optimal compression may achieve 
based on general considerations. The compression method 
of Heavens et al. (2000) compresses data to one element per 
non-degenerate cosmological parameter. Future surveys will 
constrain cosmologies with a large number of parameters 
together with a substantial list of calibration and nuisance 
parameters, so that one can expect several tens of parame- 
ters in total, corresponding to a compression to TVrj ~ 100. 
Alternatively, one can argue from the number of character- 
istic features in the matter power spectrum, which features 
a number of transition scales beyond the overall amplitude 
and slope, and the clustering growth rate, leading to an es- 
timate of several tens of parameters. 

7.3.1 Compression in CMB analysis 

Gupta & Heavens (2002) demonstrate that the some 400 
temperature power spectrum modes can be compressed to 
the 10 — 20 cosmological parameters of interest. The com- 
putation of these modes requires a one-off 0(TV pa raTVf>) cal- 
culation, compared to the 0{N%) needed for a brute-force 
likelihood approach. However our aim here is to minimise 
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the uncertainty on the likelihood function, rather than to 
speed up the analysis. 

7.3.2 Compression in weak lensing 

In lensing the COSEBI two-point statistics are an efficient 
way of compressing angular weak lensing data. Asgari et al. 
(2012) found that of the order 10 modes capture the bulk 
of the cosmological information. Moreover, the B-mode is 
expected to have low signal-to-noise while cross-correlations 
between E- and B-mode should vanish altogether, so that 
these signals can be compressed efficiently into a small num- 
ber of elements. 

For radial lensing modes, Heavens, et al. (2003) applied 
Karhunen-Loeve methods to a 3-D weak lensing analysis 
and found that only 4 (out of 100) radial modes contain 
significant information (see Hu, 1999, for similar conclu- 
sions on tomographic weak lensing data). Tomographic or 
3-D weak lensing analyses will also have to simultaneously 
model intrinsic galaxy alignments, which are primarily sep- 
arated from the lensing signal via their different redshift 
dependence. A maximum number of 10 radial elements in 
the data vector per angular frequency should be a conser- 
vative estimate. The main issue is then the independence 
of these modes, since we have of the order ~ N% cross- 
spectra. If the modes are independent, we only need around 
Nd ~ 100 spectra to consider, in agreement with the es- 
timate for Karhunen-Loeve methods. If they are not truly 
independent and we rely on angular compression, we may 
only compress weak lensing data by a factor of a few. This 
gives us a range of possible compression factor from 10 to a 
few. 

If data compression can compress data so that Nd ~ 
-/Vpara ~ 100, this would be very powerful and imply wc 
only ever need around Ns ~ 300 realisations for any survey. 
However, this is probably over-optimistic, and if the real 
compression is a factor of 10, large data-sets of Nd ~ 10 4 — 
10 6 will still be difficult to accurately analyse. 

7.4 Resampling techniques 

In previous Sections we have considered how we can reduce 
the need to generate large numbers of realisations of our 
surveys to ensure the accuracy of parameter errors. Here we 
discuss how we could generate these realisations. In general 
we can divide this into external realisations, usually from 
simulations, or internal realisations from e.g., Jackknife or 
Bootstrap resampling of the data itself. 



achieve the same error tolerance on the covariance matrix is 
reduced by a factor of 8, at the price of having to run these 
simulations into the future and with more frequent snapshot 
outputs. 

7.4-2 Internal resampling: Jackknife 

If the statistical properties of the data are poorly known, 
one can create internal samples by resampling the observed 
data itself. In this case the data covariance is only estimated 
at one point in parameter-space, from a single realisation. 
A long-established method is the Jackknife method (Tukey, 
1958), which, in the astronomical context of a correlated spa- 
tial random process, requires the survey to be split up into 
iV su b equally sized sub-regions. Jackknife samples are con- 
structed by deleting one sub-region in turn (the delete-one 
Jackknife) and using the galaxy catalogues of the remaining 
survey area to re-compute the signal mean. The Jackknife 
covariance of this mean is then given by (Efron, 1980) 

Mjack = JV "" b ~ 1 Y AD a AD a , (92) 

a=l 

with AD a — D a — D a , where the subscript a indicates both 
the realisation and that the sub-region a has been deleted. 
In the limit of uncorrelated data equation (92) is equivalent 
to the standard estimator, equation (10) applied to the sur- 
vey sub-regions (Efron, 1980; Shao & Wu, 1989). In this case 
there is no advantage in using Jackknifing. If there are cor- 
relations between sub-regions, these will be missed by stan- 
dard estimation while the Jackknife, taken over the whole 
survey bar one sub-region, will measure these. 

We can estimate the number of sub-regions required for 
the Jackknife, by using the results for the Wishart distribu- 
tion, assuming that correlations between the sub-regions are 
negligible, where N BU h — Ns- For example in the case of a 
weak lensing survey covering 15,000deg 2 , the requirement 
of having of order Ns = 10 4 realisation implies that the 
sub-regions would be little more than 1 degree on a side^ . 
To avoid bias due to the impact of sub-region boundaries, 
the largest scales that could be probed would have to be 
much smaller, and hence jackknife estimates are likely re- 
stricted to, but potentially useful on, smaller scales. Even in 
this limit, the estimator based on equation (92) will still be 
biased because the sub- regions are not independent, due to 
residual correlations, so that the actual number of degrees 
of freedom is smaller but unknown. In this case, we do not 
know how to correct for the Wishart bias. 



7-4-1 Simulation Mode Resampling 

If we do need to create large numbers of external samples 
via cosmological simulations, Schneider et al. (2011) have 
proposed a method to rapidly generate, pseudo-independent 
random realisations from a single N-body simulation. This 
resampling the large-scale, quasi- Gaussian Fourier modes 
using a semi-analytic formalism. The approach reproduces 
power spectrum covariances well, including the coupling be- 
tween linear and non-linear scales, although a small bias in 
the covariance elements is introduced. Schneider et al. (2011) 
find that the number of full N-body simulations required to 



7-4-3 Internal resampling: Bootstrap 

Another class of resampling techniques, containing a large 
number of variants, is the Bootstrap (Efron, 1979). This as- 
sumes that the empirical distribution function of a sample 
of independently and identically distributed data of size n 

T Note that we do not consider to split up the survey into sub- 
volumes as done in Norberg et al., (2009) because of the very 
strong correlations expected along the line-of-sight for a weak 
lensing survey, due to photometric redshift errors and particularly 
the broad lensing kernel. 
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provides an unbiased estimate of the true underlying popu- 
lation from which the data has been drawn. Point estimates, 
e.g. of the covariance, can be obtained via standard estima- 
tors such as equation (10), by generating new samples from 
this empirical distribution, where the sum now runs over the 
number of bootstrap samples, Nb ■ If the bootstrapped sam- 
ple has size N r , the maximum number of distinct samples 
that can be drawn is (n+N r — l)!/[(n— l)!iV r !], which quickly 
grows large for moderate n and N r , so that usually Monte 
Carlo methods are employed to create bootstrap samples. 
Sub-regions of the survey are resampled, analogous to the 
jackknife. In astronomy these resampled sub-regions are gen- 
erally chosen to be non-overlapping (fixed-block bootstrap) , 
although this may be suboptimal (Nordman, et al., 2007). 

If Nb is large, the uncertainty on the covariance esti- 
mate becomes negligible, so that its inverse is an unbiased 
estimator of the precision matrix. However, there are multi- 
ple other sources of bias in the bootstrap technique which are 
hard to quantify. If the number of sub-regions is small, the 
empirical distribution is a coarse representation of the un- 
derlying population, and any local features may be missed, 
and the convergence of the error tolerance is slower (Nord- 
man, et al., 2007). If the number of sub- regions is large and 
their area small, the correlation structure is not well pre- 
served. It is possible that some sub-regions are not drawn 
at all in a given bootstrap sample, so that the total area 
coverage of this sample is less than that of the original sur- 
vey, biasing the covariance estimate upwards. Drawing more 
sub-regions for bootstrap samples than the iV su b regions of 
the original data, i.e. N r > iV su b, can remedy this bias (Nor- 
berg, et al., 2009), but the choice of iV r constitutes another 
parameter that requires calibration. 

While the bootstrap method effectively by-passes the 
Wishart bound by creating a very large number of realisa- 
tions, its multiple sources of bias, which all depend on the 
data set at hand, make an application to precision mea- 
surements questionable. Note that this is an active field of 
research, so that this conclusion could change on moderate 
time scales (see e.g. Loh, 2008, for a recent application of a 
spatial bootstrap variant to large-scale structure data). 

7.5 Summary of Alternatives 

Which of the various routes to inverse covariance estima- 
tion is optimal depends strongly on the problem at hand, 
being influenced by aspects as diverse as the survey charac- 
teristics, the complexity of obtaining the signal, the compu- 
tational cost of acquiring simulated survey realisations, the 
availability and accuracy of models, and the questions one 
aims to answer with the data. Clearly, if one has an accurate 
model of the data covariance to hand, this is what should 
be used. A poorer model can still be used, either to aid a 
fitting-function approach, or as the model target for shrink- 
age. Even if a model is not available, empirical shrinkage 
appears to work well, and all of these methods seem unbi- 
ased and yield a statistical error on the precision matrix less 
the 5% even when Ns < Nd ■ Stein and Haff shrinkage yield 
a bias which must be calibrated. Data compression may op- 
timally reduct the number of data-points to the number of 
model parameters, typically a few hundred values, at the 
expense of loss of some information, and model-dependency. 
However, if not optimal the gain may only be a factor of 10's. 



Mode-resampling of simulations may lead to production of 
large numbers of realisation, with a factor of ~ 8 possible. 
Empirical resampling of the data will lead to bias estimates 
which will need calibration, but the Jackknife has promise in 
the small-scale, highly-subsampled regime. Table 1 provides 
a summary of our findings. Finally, we caution again that 
our tests have been on idealised data, which closely match 
our model, while data compression and resampling methods 
require detailed testing. 

8 SUMMARY AND CONCLUSIONS 

Over the next decade and beyond the size of cosmological 
data sets, and the potential accuracy and ability to probe 
new physics, will continue to rise dramatically. To ensure 
that the expected accuracies are reached, we need to con- 
sider the dominant sources of bias and uncertainty in our 
measurements. In this paper we have developed and explored 
a new framework to study the effect of random errors in the 
estimation of the data covariance matrix and its inverse, the 
precision matrix, on cosmological parameter estimation. For 
multivariate Gaussian data, the likelihood function depends 
sensitively on the precision matrix. 

In many areas of cosmology and astrophysics, the data 
covariance matrix cannot be predicted analytically and we 
must rely on estimating it from independent, random realisa- 
tions of the observations. The simplest estimator of the data 
covariance matrix is unbiased, and the uncertainty drops 
with the inverse square-root of the number of independent 
samples. More generally, the sample data covariance matrix 
follows a Wishart distribution. In contrast, the simplest es- 
timator for the precision matrix, taking the inverse of the 
sample data covariance, is biased. We can find an unbiased 
estimate if the difference between number of realisations, 
Ns, and number of data points No, is Ns — No > 2. How- 
ever, the precision matrix follows an Inverse- Wishart distri- 
bution, and its variance can diverge if Ns — No < 4. 

We have tested and illustrated this behaviour by simu- 
lating 10 4 weak lensing surveys and dividing into 100 groups 
of Ns = 100 samples, and shown the mean and variance 
of the sample data covariance and precision matrix follow 
the predicted properties of the Wishart and Inverse- Wishart 
distribution. Future cosmological surveys will have of order 
Nd ~ 10 4 — 10 6 data-points, even with radical compres- 
sion into power spectra and correlation functions, and so the 
Wishart bound implies large numbers of realisations will be 
required. In addition, the only known unbiased estimator of 
the precision matrix is also the simplest, so other methods 
will be biased and we may not be able to quantify this bias 
without comparing with simulations. 

The properties of the precision matrix where propa- 
gated into the uncertainty in the errors on cosmological 
parameters, using a Fisher matrix formalism, and we have 
shown that: 

• The fractional errors on the variance of a parameter and 
the diagonals of the precision matrix are equal. 

• The fractional error on the parameter variance depends 
on inverse square-root of Ns — Nd — 4, which can diverge 
when the number of realisations, Ns is equal to Nd + 4. 

• The number of realisations needed to reach a given ac- 
curacy on parameter errors must be greater than the sum 
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METHOD 




Gain 


Cost 


Comment 


Unbiased 




Unbiased, known variance 


N s > 200 + No + 4 


Costly for large No 


ivxoueiiing 




Unbiased/No variance 


Complex modelling 


Nonlinear, baryonic physics 


Shrinkage 


otem 
Haff 


factor ~ 5 reduction in Ng 
factor 5 reduction in Ns 


Unknown bias 
Unknown bias 


nange i\ s s> jv o 
Range N s > N D 




Model Target 
Mean Target 


factor Ri 10 reduction in Ns 
factor xs 5 reduction in Ng 


Low/no bias 
Low/no bias 


Applicable for N,s < N D 
Applicable for N s < N D 


Compression 




N D ~ AWa ~ 100 


Information loss 


Model dependent 




Simulation 


Factor 8 gain 


More expensive simulations 


Accuracy to be tested 


Resampling 


Jackknife 
Bootstrap 


No simulations required 
No simulations required /No bias 


Unknown bias and variance 
Multiple unknown biases 


Needs calibration 
Needs calibration 



Table 1. Summary Table of results of different methods for estimating the precision matrix for parameter estimation. The methods are 
described in detail in Section 7.2, and the results of the tests in Section 7.2.4. 



of the number data points, and the inverse of the fractional 
variance of the parameter variance (equation 58). 

• The error on the sample data covariance is equal to the 
precision matrix for small data-sets, while it scales as the 
inverse-square root of the number of data points for large 
data sets. 

If we want to have a 5% accuracy on a parameter error, 
and the number of data points is <g 100, we need « 200 reali- 
sations and a 10% error on the data covariance matrix. If the 
data set is greater than 100 points, we will need Ns ~ No 
realisations and the fractional error on the data covariance 
matrix is « 10(A^o/200) -1 ^ 2 percent. We also have shown 
how the uncertainties propagates into the Figure-of-Merit, 
and found similar conclusions. To attain high-accuracy from 
large-scale data sets seems to require equally large-numbers 
of realisations of the survey. We have explored some of the 
possible alternatives to alleviate this conclusion: 

• Theoretical Modelling: If we can accurately model 
the data covariance we avoid the Wishart bound. 

• Shrinkage Estimators: We reduce the required sam- 
ple by combining empirical or theoretical estimates of the 
precision or data covariance matrix with sample estimates. 

• Data Compression: We can reduce the number of 
data points, No, in principle to the number of parameters. 

• Simulations mode resampling: We can rapidly gen- 
erate external realisations, but should check for accuracy. 

• Data Resampling: The Jackknife resampling method 
may be useful on small-scales but will be biased. The Boot- 
strap method can generate large number of samples and so 
not be biased, but needs further study to avoid other biases. 

The combination of theoretical modelling and target shrink- 
age looks particularly promising and robust, but clearly 
needs to be developed and tested in more detail for applica- 
tion. 

In summary, many of the details of precision cosmology 
are still to be worked out. We have identified the estima- 
tion of the precision matrix as a key issue, for Gaussian- 
distributed data, requiring the generation of large numbers 
of realisations for large-datasets. We have investigated a 



number of possible ways forward, although the actual resolu- 
tion of this issue may require the use of multiple techniques. 
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APPENDIX A: 

COSMOLOGICAL LARGE-SCALE STRUCTURE 
DATA-SETS 

In general, cosmological data can be in any number of forms, 
and our analysis is applicable to a wide variety of data. If 
we are working with pixelised maps then the data are pixel- 
values and the data covariance matrix is the pixel-covariancc 
matrix. If we have compressed information into two-point 
power spectra or correlation functions then this forms the 
data vector, and the data covariance matrix is the field's 
four-point function. In this paper we shall assume the data 
is compressed into the two-point power spectra, although 
our formulae can be used for pixelised data, correlation func- 
tions, or higher-order correlations. Here we outline three ba- 
sic areas of large-scale structure study, and how their data- 
vectors are generated. 



Al: Galaxy Redshift Surveys 

In galaxy redshift surveys, we can predict the distribution 
of the matter overdensity field, 



5(r) 



P( r ) - (P) 

(P) ' 



(93) 



which we can compare with data about the galaxy overden- 
sity, 



8 B (r) 



n(r) — n(r) 



(94) 



were n(r) is the galaxy number-distribution and n(r) is the 
survey radial selection function. The Fourier transform of 
the matter overdensity, S(k), is 



5(k) 



-I 



a r o(r)e 



(95) 



The galaxy distribution can also be expanded in spherical 
harmonics, Si m (z), and radial Bessel functions, 5e mn (e.g., 
Heavens Ez Taylor, 1995). As we measure galaxy radial po- 
sitions with redshift, which combines the Hubble expansion 
with peculiar velocities, the galaxy distribution is changed 
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by redshift-space distortions (Kaiser, 1987; Hamilton, 1998) 
so that in Fourier space, 

*'(fc)= (b(k) + f(n m )^l)S(k), (96) 

where b(k) is a scale-dependent galaxy bias factor, f(Q m ) = 
din S /din a is the growth index. The correlation of the 
modes of the overdensity field is 

{6' g (k)6' g *(k')) = (2iv) a P° g (k)S D (k-k'), (97) 

where P gg (k) is the anisotropic redshift-space galaxy power 
spectrum. We can estimate the anisotropic redshift-space 
galaxy density power spectrum from 

^ fl (*.wo = T^— Y,\m\ 2 , (98) 

J 'modes 

k z 

where the summation is over the ./V mo dcs in the fe 3 -direction. 
The data is the discretely sampled redshift-space power 
spectrum; 

A = Pggik,), (99) 

where a hat ~ indicates the observed estimate of the power. 
A2: CMB 

In Cosmic Microwave Background experiments we can de- 
fine the temperature fluctuations as — AT/T, and the 
temperature power-spectrum, Cj T , is defined by 

{QemQe'm'} = Cj T 5 u ,5 mm , . (100) 

If polarisation data is added to this, in the form of E— and 
B— modes, we can construct 6 power spectra, 

d = (6T, cf E , c BB ,cJ E , cJ B ,cf B ), (ioi) 

which can form our data-vector. We can estimate these 
cross-spectra from 

Cf Y = E X t,™YZm, (102) 

where (X, Y) = (0, E, B), and we have summed over all az- 
imuthal m-modes for each I. Again, in practise these power- 
spectra would be convolved by the survey mask (e.g., Brown, 
Castro & Taylor, 2005). 

A3: Weak Lensing 

In weak lensing surveys the data can be the estimated shear 
values, 7i(0, z), where i — (1,2) are the two orthogonal 
modes of the shear. The shear-shear covariance matrix, for 
an unmasked survey, is 

( 7i (£,z) 7 *(£',z')) = (2k) 2 C7?(£, z ,' z )5 d (£-£') (103) 

where C??(£, z, z') is the shear power-spectrum. The shear 
can be decomposed into a potential (k, convergence) and 
curl (/3) part, 

k{£, z) + i0(£, z) = e 2ilpt (71 + ij 2 )(£, z), (104) 

where tpe is the angle between the wavevector, £, and 
an axis of the coordinate system the shear is measured 
in. This decomposition generates three power spectra, 

C ,KK (^z,z'),C ,3 ' 3 (^,z,z'),C" I/3 (^2^')- In Principle, we can 
also add a magnification field, /i, estimated from the size of 



galaxy images (e.g., Schmidt et al., 2012; Casaponsa et al., 
2012), yielding the magnification power, C^{1, z, z'). The 
data, compressed into these power spectra, is then 

d = (d^{e,z,z'),d KK (£,z,z'),d^(i,z,z'), 

d" K {£, z, z'),C^(£, z, z), d^{£, z, z')) . (105) 

The estimated cross-power spectrum, at two different red- 
shifts, can be estimated by 

df Y (z, z') = - J— V X{£, z)Y\£, z'), (106) 

J V modes 

\t\=t 

where (X, Y) — (fi, k, /3), and we have summed over JV mo d es 
is a shell in ^-space. In general, the effects of a survey mask, 
due to survey geometry and bright stars, will convolve these 
spectra. We can also estimate these statistics in real-space, 
through correlation functions, or other weighted two-point 
functions such as M ap or COSEBIs (Schneider, et al., 2010; 
Asgari, et al., 2012). 

A4: Combining data-sets 

Combining data-sets can be achieved by combining the data- 
vectors of each data-set, and including all the cross-terms 
between the surveys. To do this we define a vector for field- 
values; 

(z),f3 em (z)) ,(107) 

where we have chosen to expand the lensing shear, magnifi- 
cation and galaxy redshift fields into spherical harmonics for 
consistency. We then form all of the auto-and cross-spectra 
of these fields, 

c ? iXi (*>*') = ^ (los) 

m 

where we have summed over all azimuthal modes on the 
sky. The data-vector, D, is then the vector of all auto- and 
cross-spectra. 



APPENDIX B: 

PROPERTIES OF THE WISHART AND 
INVERSE- WISHART DISTRIBUTIONS 

Many of the properties of the Wishart and Inverse- Wishart 
distributions reside in technical mathematical statistics pa- 
pers and specialist textbooks. Few proofs written for physi- 
cists are available, so in this Appendix we present our own 
derivations of some useful results used in this paper. Most of 
the results, if not the details, can be found in Press (1982). 

Bl. Wishart distribution 

Let Vbeapxp symmetric matrix so that 

n 

V = ^2x a xi, (109) 

a=l 

where a; is a vector drawn from a multivariate Gaussian 
distribution, and each of n realisations is sampled indepen- 
dently. The distribution of V is given by (Wishart, 1928) 
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P (V\'E) = c\V\ < - n - p - 1)/2 \'S\- n/2 e-i TrV ' s \ (110) 
where 

c= [2" p / 2 r p [n/2]]~\ (111) 

and £ sets the scale of the distribution. The Multivariate 
Gamma Function, T p (a), is defined as 

dX \x\-^)/\-^X 



r„(a) 



(112) 
(113) 



where X is a p x p matrix and the matrix integration is over 
all positive-definite elements, 

v v 

dX = \\\\dX^, (114) 

i=l j = l 

or for a symmetric matrix over all non-repeated elements, 



dx=iij[dx K 



(115) 



i=i j=i 



B2. Derivation of the Wishart distribution 

We can derive the Wishart distribution through its char- 
acteristic function, the Fourier transform of the probability 
distribution function, 



■(J) / = V>(V|E)e* Tr ^ = ^ TrJV ^. 



(116) 



Expanding the Fourier exponential in a Taylor series and 
taking expectations we find, 

4>{J) = 1 + i(Tr(JV)) - ^{(TtJV)(TtJV)) + ■■■. (117) 

Using the Gaussian properties of x, the first and second 
moments of V are 



(118) 



{Vij Vmn) 



(119) 



Taking the expectation values, we can re-write the series for 
the characteristic function in terms of the scale matrix, S, 
as 

4>{J) = 1 + mTr( JS) 

-i (2nTr(JEJE) + n 2 (Tr J£)(TrJ£)) 
+ •••. (120) 
Collecting terms in equation (120) in powers of n we find 
(j>{J) = 1 + n[iTr( JS) - Tr[(J£) 2 ] + ■ ■ ■] 

+ in 2 [iTr(JS) - Tr[(J£) 2 ] + ■ ■ -] 2 + ■ • • . (121) 

Each of the series with factor n, n 2 , etc, can we summed 
to a logarithm, using the series relation ln(l + x) = 



Er=o(- 1 ) n+1 ^/«, which yields 

2 

<t>(J) = 1- ^Trln(/-2aS) + ^[Trln(I-2aS)] 2 
2 8 



(122) 



This last series can now be summed, using e x — 

<j>{J) = exp (-^Trln(I-2iJ£)) . (123) 

Using the matrix identity, In det A = Tr In A, we find the 
characteristic function for V is 



<t>{J) = \I - 2iJY,\- n/2 . 



(124) 



We can show the Wishart distribution has the same 
characteristic function by direct integration; 

J>(J) = J dV P (V\S)e^ JV 

= C |E|-™/ 2 y d y|y|("-p-i)/2 e -^TrVS- 1 [/-2 l JS]^ 

(125) 

It is convenient to define the new matrix variable 

© = E _1 [J — 2iJ£], (126) 



and the matrix 



X = -V®. 

2 



(127) 



The transformation of the matrix volume element is given 
by 

dV = 2 p{p+1)/2 \@\- (p+1)/2 dX. (128) 
We can now rewrite the characteristic function as 



4>{J) 



c\I -2iJT,\- n ' 2 2 pn ' 2 j dX\X 



= |J-2iJS| 



-n/2 



|(„- P -l)/2 e -TrX 



(129) 



where we have made use of the Multivariate Gamma Func- 
tion (equation 113) to cancel terms in c. Identifying <fi(J) 
as the characteristic function of V from equation (124), we 
confirm that the Wishart distribution, p(V|S), is the prob- 
ability distribution for the matrix V . The moments of the 
Wishart distribution can be found directly from the expan- 
sion of the characteristic function in equation (120). 

B3. Inverse- Wishart distribution 

The Inverse- Wishart distribution may be found from the 
Wishart distribution by a change of variables. We first note 
that the Jacobian for the transformation U = V^ 1 , where 
U and V are pxp symmetric matrices is (see Appendix C3) 



dV =\u\- (p+1) du. 

If we further define G = S~ then 
p(U\G)dU = p{V\H)dV 



(130) 



c\V 



(n-p-l)/2. 



-n/2 



e 2 



TrVXT 



dV, 



| [7 |-(„-p-i)/2| G |„/2 e -iTrl7 X G 
x\U\- (p+1) dU, 
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= c \U\- (n+p+1)/2 \Gr /2 e-? TrU ~ lG dlXl3l) 

Hence, the Inverse- Wishart distribution is 

p(U\G) = c |[/|-("+P+ 1 )/ 2 |Gr/ 2 e-3 Trt/_lG . (132) 

We should note that we have assumed the number-of-degrees 
of freedom, n, is the same for the Wishart and Inverse- 
Wishart, to simplify the derivation. However, the Inverse- 
Wishart can be parameterized differently, and different au- 
thors choose different relations between the Wishart and 
Inverse- Wishart degrees-of-freedom. If we let in be the num- 
ber of degrees-of-freedom for the Inverse- Wishart, we have 
set m — n. Other choices commonly used are m = n + p — 1, 
m = n + p+ 1, or m = n — p — 1. For example if we assume 
m — n — p — 1 (Press, 1982), we would write the Inverse- 
Wishart distribution as 



p(C/|G) = c |G|(' l -f- 1 '/ 2 |C7|- 



,/ 2 -iTrGLT" 1 



where 



co = 2 



p(n-p-l)/2 



r„[(n-p-l)/2]] 



(133) 



(134) 



The moments of the Inverse- Wishart can be found by direct 
integration over the Inverse- Wishart distribution. 



or with indices, Yij = A ik X k j the matrix volume-elements 
for a non-symmetric matrix are 



dX 



n 



JJdX i: 



.3=1 



(140) 



Each of the sub-vectors transforms as a vector, so that 

p v 

JJdy^^lAlJJdXi,-, (141) 

3 = 1 3=1 

as each sub- vector has the same determinant of A. The ma- 
trix volume-elements then transform as 

dY = \A\ p dX. (142) 

If we consider now the linear transformation 

Y = AXB, (143) 

or Yij — AikXkiBij, were X is p x q and B is q x q, we can 
apply our transformation rules twice to see that 

dY = \A\ p \B\ q dX. (144) 

If B = A 1 , and q = p we find for a general matrix 

dY = |A! 2p dX. (145) 



APPENDIX C: 

CHANGE OF RANDOM MATRIX VARIABLES 

In many cases we want to know how to change random 
matrix variables, for example in order to carry out matrix 
integration and to derive the Inverse- Wishart distribution. 
Here we present some useful matrix transformation rela- 
tions, without proof. 



CI: Vector transformations 

We first consider vector integration and change of vari- 
able. For a p-dimensional vector x, the infinitesimal volume- 
clement is 



dx = d p x — J^J dxi . 



(135) 



For a vector, x which is related to the vector y by the linear 
transformation , 



y = Ax, 



(136) 



or with indices yi = AijXj, and the volume element trans- 
forms as 



C3: Transformation of symmetric matrices 

If the matrix X is a p x p symmetric matrix, X — X f , and 
we consider a transformation of the form 



Y = AX A 1 

then 

dY = \A\ p+1 dX, 



(146) 



(147) 



when | A\ / 0. In addition, if X is symmetric and we multi- 
ply by a scalar , a, so that 



Y = aX, 
then 

dY = a p(p+1)/2 dX. 



(148) 



(149) 



Finally, if we want to transform to the inverse of a symmetric 
matrix, so that 



X 'XX '. 



Y = X 

then 

dY = \X\- (p+1) dX 



(150) 
(151) 



JJdy i = |A|JJtte J - 

i=l j=l 

or equivalently 
dy = |A|da:. 



(137) 



(138) 



C2: Transformation of non-symmetric matrices 

If we transform a p x p matrix X to the matrix Y , 

Y = AX, (139) 
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